Code Tutorial#
import torch
import IPython.display as ipd
sr = 44100
duration = 5
audio_sample = torch.randn(1, sr * duration)
ipd.Audio(audio_sample.numpy(), rate=sr)
Stable Audio Open Tutorial#
Stable Audio Open is fully avaiable through HuggingFace. To run Stable Audio Open locally, you’ll first need to generate a $HF_TOKEN for yourself, which can be done here https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication (which you will first need a HuggingFace account for). Once you generate the token, you should export it as an environment variable with a bash command like
export HF_TOKEN="YOUR_HF_TOKEN"
The rest of the tutorial very much follows the demo design of the public Stable Audio Open resources:
First, we’ll install some dependencies if you don’t already have them. Stable-Audio-Tools can be a bit finnicky to install directly, so we suggest making a dedicated virtual envinroment (and not conda) to run this notebook.
!pip install torch torchaudio torchvision stable-audio-tools einops
Requirement already satisfied: torch in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (2.4.1)
Requirement already satisfied: torchaudio in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (2.4.1)
Requirement already satisfied: torchvision in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (0.19.1)
Requirement already satisfied: stable-audio-tools in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (0.0.16)
Requirement already satisfied: einops in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (0.7.0)
Requirement already satisfied: filelock in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torch) (3.16.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torch) (4.12.2)
Requirement already satisfied: sympy in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torch) (1.13.2)
Requirement already satisfied: networkx in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torch) (3.3)
Requirement already satisfied: jinja2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torch) (3.1.4)
Requirement already satisfied: fsspec in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torch) (2024.10.0)
Requirement already satisfied: numpy in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torchvision) (1.23.5)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torchvision) (10.4.0)
Requirement already satisfied: aeiou==0.0.20 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.0.20)
Requirement already satisfied: alias-free-torch==0.0.6 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.0.6)
Requirement already satisfied: auraloss==0.4.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.4.0)
Requirement already satisfied: descript-audio-codec==1.0.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (1.0.0)
Requirement already satisfied: einops-exts==0.0.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.0.4)
Requirement already satisfied: ema-pytorch==0.2.3 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.2.3)
Requirement already satisfied: encodec==0.1.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.1.1)
Requirement already satisfied: gradio>=3.42.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (5.5.0)
Requirement already satisfied: huggingface-hub in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.26.2)
Requirement already satisfied: importlib-resources==5.12.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (5.12.0)
Requirement already satisfied: k-diffusion==0.1.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.1.1)
Requirement already satisfied: laion-clap==1.1.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (1.1.4)
Requirement already satisfied: local-attention==1.8.6 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (1.8.6)
Requirement already satisfied: pandas==2.0.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (2.0.2)
Requirement already satisfied: pedalboard==0.7.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.7.4)
Requirement already satisfied: prefigure==0.0.9 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.0.9)
Requirement already satisfied: pytorch-lightning==2.1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (2.1.0)
Requirement already satisfied: PyWavelets==1.4.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (1.4.1)
Requirement already satisfied: safetensors in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.4.5)
Requirement already satisfied: sentencepiece==0.1.99 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.1.99)
Requirement already satisfied: s3fs in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (2024.10.0)
Requirement already satisfied: torchmetrics==0.11.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.11.4)
Requirement already satisfied: tqdm in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (4.66.5)
Requirement already satisfied: transformers in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (4.45.2)
Requirement already satisfied: v-diffusion-pytorch==0.0.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.0.2)
Requirement already satisfied: vector-quantize-pytorch==1.9.14 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (1.9.14)
Requirement already satisfied: wandb==0.15.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.15.4)
Requirement already satisfied: webdataset==0.2.48 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (0.2.48)
Requirement already satisfied: x-transformers<1.27.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stable-audio-tools) (1.26.6)
Requirement already satisfied: fastcore in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (1.7.19)
Requirement already satisfied: plotly in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (5.24.1)
Requirement already satisfied: bokeh in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (3.6.1)
Requirement already satisfied: holoviews in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (1.20.0)
Requirement already satisfied: scipy in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (1.14.1)
Requirement already satisfied: matplotlib in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (3.9.2)
Requirement already satisfied: librosa>=0.8.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (0.9.2)
Requirement already satisfied: ipython in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (8.27.0)
Requirement already satisfied: accelerate in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (1.1.1)
Requirement already satisfied: soundfile<=0.10.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (0.10.2)
Requirement already satisfied: umap-learn in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aeiou==0.0.20->stable-audio-tools) (0.5.7)
Requirement already satisfied: argbind>=0.3.7 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audio-codec==1.0.0->stable-audio-tools) (0.3.9)
Requirement already satisfied: descript-audiotools>=0.7.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audio-codec==1.0.0->stable-audio-tools) (0.7.2)
Requirement already satisfied: clean-fid in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (0.1.35)
Requirement already satisfied: clip-anytorch in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (2.6.0)
Requirement already satisfied: dctorch in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (0.1.2)
Requirement already satisfied: jsonmerge in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (1.9.2)
Requirement already satisfied: kornia in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (0.7.4)
Requirement already satisfied: scikit-image in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (0.24.0)
Requirement already satisfied: torchdiffeq in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (0.2.4)
Requirement already satisfied: torchsde in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from k-diffusion==0.1.1->stable-audio-tools) (0.2.6)
Requirement already satisfied: torchlibrosa in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (0.1.0)
Requirement already satisfied: ftfy in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (6.3.1)
Requirement already satisfied: braceexpand in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (0.1.7)
Requirement already satisfied: wget in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (3.2)
Requirement already satisfied: llvmlite in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (0.43.0)
Requirement already satisfied: scikit-learn in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (1.5.2)
Requirement already satisfied: h5py in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (3.12.1)
Requirement already satisfied: regex in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (2024.9.11)
Requirement already satisfied: progressbar in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from laion-clap==1.1.4->stable-audio-tools) (2.5)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pandas==2.0.2->stable-audio-tools) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pandas==2.0.2->stable-audio-tools) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pandas==2.0.2->stable-audio-tools) (2024.2)
Collecting argparse (from prefigure==0.0.9->stable-audio-tools)
Using cached argparse-1.4.0-py2.py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: configparser in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from prefigure==0.0.9->stable-audio-tools) (7.1.0)
Requirement already satisfied: gin-config in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from prefigure==0.0.9->stable-audio-tools) (0.5.0)
Requirement already satisfied: PyYAML>=5.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pytorch-lightning==2.1.0->stable-audio-tools) (6.0.2)
Requirement already satisfied: packaging>=20.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pytorch-lightning==2.1.0->stable-audio-tools) (24.1)
Requirement already satisfied: lightning-utilities>=0.8.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pytorch-lightning==2.1.0->stable-audio-tools) (0.11.8)
Requirement already satisfied: requests in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from v-diffusion-pytorch==0.0.2->stable-audio-tools) (2.32.3)
Requirement already satisfied: Click!=8.0.0,>=7.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (8.1.7)
Requirement already satisfied: GitPython!=3.1.29,>=1.0.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (3.1.43)
Requirement already satisfied: psutil>=5.0.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (6.0.0)
Requirement already satisfied: sentry-sdk>=1.0.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (2.18.0)
Requirement already satisfied: docker-pycreds>=0.4.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (0.4.0)
Requirement already satisfied: pathtools in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (0.1.2)
Requirement already satisfied: setproctitle in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (1.3.3)
Requirement already satisfied: setuptools in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (72.1.0)
Requirement already satisfied: appdirs>=1.4.3 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (1.4.4)
Requirement already satisfied: protobuf!=4.21.0,<5,>=3.19.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from wandb==0.15.4->stable-audio-tools) (3.19.6)
Requirement already satisfied: aiofiles<24.0,>=22.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (23.2.1)
Requirement already satisfied: anyio<5.0,>=3.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (4.6.2.post1)
Requirement already satisfied: fastapi<1.0,>=0.115.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.115.4)
Requirement already satisfied: ffmpy in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.4.0)
Requirement already satisfied: gradio-client==1.4.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (1.4.2)
Requirement already satisfied: httpx>=0.24.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.27.2)
Requirement already satisfied: markupsafe~=2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (2.1.5)
Requirement already satisfied: orjson~=3.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (3.10.11)
Requirement already satisfied: pydantic>=2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (2.9.2)
Requirement already satisfied: pydub in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.25.1)
Requirement already satisfied: python-multipart==0.0.12 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.0.12)
Requirement already satisfied: ruff>=0.2.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.7.3)
Requirement already satisfied: safehttpx<1.0,>=0.1.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.1.1)
Requirement already satisfied: semantic-version~=2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (2.10.0)
Requirement already satisfied: starlette<1.0,>=0.40.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.41.2)
Requirement already satisfied: tomlkit==0.12.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.12.0)
Requirement already satisfied: typer<1.0,>=0.12 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.13.0)
Requirement already satisfied: uvicorn>=0.14.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio>=3.42.0->stable-audio-tools) (0.32.0)
Requirement already satisfied: websockets<13.0,>=10.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gradio-client==1.4.2->gradio>=3.42.0->stable-audio-tools) (12.0)
Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from s3fs->stable-audio-tools) (2.15.2)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from s3fs->stable-audio-tools) (3.10.10)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from sympy->torch) (1.3.0)
Requirement already satisfied: tokenizers<0.21,>=0.20 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from transformers->stable-audio-tools) (0.20.1)
Requirement already satisfied: botocore<1.35.37,>=1.35.16 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->stable-audio-tools) (1.35.36)
Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->stable-audio-tools) (1.16.0)
Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->stable-audio-tools) (0.12.0)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (2.4.3)
Requirement already satisfied: aiosignal>=1.1.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (24.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (6.1.0)
Requirement already satisfied: yarl<2.0,>=1.12.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (1.15.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (4.0.3)
Requirement already satisfied: idna>=2.8 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from anyio<5.0,>=3.0->gradio>=3.42.0->stable-audio-tools) (3.8)
Requirement already satisfied: sniffio>=1.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from anyio<5.0,>=3.0->gradio>=3.42.0->stable-audio-tools) (1.3.1)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from anyio<5.0,>=3.0->gradio>=3.42.0->stable-audio-tools) (1.2.2)
Requirement already satisfied: docstring-parser in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from argbind>=0.3.7->descript-audio-codec==1.0.0->stable-audio-tools) (0.16)
Requirement already satisfied: pyloudnorm in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.1.1)
Requirement already satisfied: julius in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.2.7)
Requirement already satisfied: rich in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (13.9.4)
Requirement already satisfied: pystoi in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.4.1)
Requirement already satisfied: torch-stoi in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.2.3)
Requirement already satisfied: flatten-dict in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.4.2)
Requirement already satisfied: markdown2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (2.5.1)
Requirement already satisfied: randomname in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.2.1)
Requirement already satisfied: tensorboard in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (2.18.0)
Requirement already satisfied: six>=1.4.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from docker-pycreds>=0.4.0->wandb==0.15.4->stable-audio-tools) (1.16.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from GitPython!=3.1.29,>=1.0.0->wandb==0.15.4->stable-audio-tools) (4.0.11)
Requirement already satisfied: certifi in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from httpx>=0.24.1->gradio>=3.42.0->stable-audio-tools) (2024.8.30)
Requirement already satisfied: httpcore==1.* in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from httpx>=0.24.1->gradio>=3.42.0->stable-audio-tools) (1.0.6)
Requirement already satisfied: h11<0.15,>=0.13 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from httpcore==1.*->httpx>=0.24.1->gradio>=3.42.0->stable-audio-tools) (0.14.0)
Requirement already satisfied: audioread>=2.1.9 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (3.0.1)
Requirement already satisfied: joblib>=0.14 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (1.4.2)
Requirement already satisfied: decorator>=4.0.10 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (5.1.1)
Requirement already satisfied: resampy>=0.2.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (0.4.3)
Requirement already satisfied: numba>=0.45.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (0.60.0)
Requirement already satisfied: pooch>=1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (1.8.2)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pydantic>=2.0->gradio>=3.42.0->stable-audio-tools) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pydantic>=2.0->gradio>=3.42.0->stable-audio-tools) (2.23.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from requests->v-diffusion-pytorch==0.0.2->stable-audio-tools) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from requests->v-diffusion-pytorch==0.0.2->stable-audio-tools) (2.2.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from scikit-learn->laion-clap==1.1.4->stable-audio-tools) (3.5.0)
Requirement already satisfied: cffi>=1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from soundfile<=0.10.2->aeiou==0.0.20->stable-audio-tools) (1.17.1)
Requirement already satisfied: shellingham>=1.3.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from typer<1.0,>=0.12->gradio>=3.42.0->stable-audio-tools) (1.5.4)
Requirement already satisfied: contourpy>=1.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from bokeh->aeiou==0.0.20->stable-audio-tools) (1.3.0)
Requirement already satisfied: tornado>=6.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from bokeh->aeiou==0.0.20->stable-audio-tools) (6.4.1)
Requirement already satisfied: xyzservices>=2021.09.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from bokeh->aeiou==0.0.20->stable-audio-tools) (2024.9.0)
Requirement already satisfied: wcwidth in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ftfy->laion-clap==1.1.4->stable-audio-tools) (0.2.13)
Requirement already satisfied: colorcet in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from holoviews->aeiou==0.0.20->stable-audio-tools) (3.1.0)
Requirement already satisfied: panel>=1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from holoviews->aeiou==0.0.20->stable-audio-tools) (1.5.3)
Requirement already satisfied: param<3.0,>=2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from holoviews->aeiou==0.0.20->stable-audio-tools) (2.1.1)
Requirement already satisfied: pyviz-comms>=2.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from holoviews->aeiou==0.0.20->stable-audio-tools) (3.0.3)
Requirement already satisfied: jedi>=0.16 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (0.19.1)
Requirement already satisfied: matplotlib-inline in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (0.1.7)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (3.0.47)
Requirement already satisfied: pygments>=2.4.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (2.18.0)
Requirement already satisfied: stack-data in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (0.6.3)
Requirement already satisfied: traitlets>=5.13.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (5.14.3)
Requirement already satisfied: pexpect>4.3 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from ipython->aeiou==0.0.20->stable-audio-tools) (4.9.0)
Requirement already satisfied: jsonschema>2.4.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from jsonmerge->k-diffusion==0.1.1->stable-audio-tools) (4.23.0)
Requirement already satisfied: kornia-rs>=0.1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from kornia->k-diffusion==0.1.1->stable-audio-tools) (0.1.7)
Requirement already satisfied: cycler>=0.10 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from matplotlib->aeiou==0.0.20->stable-audio-tools) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from matplotlib->aeiou==0.0.20->stable-audio-tools) (4.53.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from matplotlib->aeiou==0.0.20->stable-audio-tools) (1.4.7)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from matplotlib->aeiou==0.0.20->stable-audio-tools) (3.1.4)
Requirement already satisfied: tenacity>=6.2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from plotly->aeiou==0.0.20->stable-audio-tools) (9.0.0)
Requirement already satisfied: imageio>=2.33 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from scikit-image->k-diffusion==0.1.1->stable-audio-tools) (2.36.0)
Requirement already satisfied: tifffile>=2022.8.12 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from scikit-image->k-diffusion==0.1.1->stable-audio-tools) (2024.9.20)
Requirement already satisfied: lazy-loader>=0.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from scikit-image->k-diffusion==0.1.1->stable-audio-tools) (0.4)
Requirement already satisfied: trampoline>=0.1.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from torchsde->k-diffusion==0.1.1->stable-audio-tools) (0.1.2)
Requirement already satisfied: pynndescent>=0.5 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from umap-learn->aeiou==0.0.20->stable-audio-tools) (0.5.13)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from botocore<1.35.37,>=1.35.16->aiobotocore<3.0.0,>=2.5.4->s3fs->stable-audio-tools) (1.0.1)
Requirement already satisfied: pycparser in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from cffi>=1.0->soundfile<=0.10.2->aeiou==0.0.20->stable-audio-tools) (2.22)
Requirement already satisfied: smmap<6,>=3.0.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->GitPython!=3.1.29,>=1.0.0->wandb==0.15.4->stable-audio-tools) (5.0.1)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from jedi>=0.16->ipython->aeiou==0.0.20->stable-audio-tools) (0.8.4)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from jsonschema>2.4.0->jsonmerge->k-diffusion==0.1.1->stable-audio-tools) (2023.12.1)
Requirement already satisfied: referencing>=0.28.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from jsonschema>2.4.0->jsonmerge->k-diffusion==0.1.1->stable-audio-tools) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from jsonschema>2.4.0->jsonmerge->k-diffusion==0.1.1->stable-audio-tools) (0.20.0)
Requirement already satisfied: bleach in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (6.2.0)
Requirement already satisfied: linkify-it-py in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (2.0.3)
Requirement already satisfied: markdown in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (3.7)
Requirement already satisfied: markdown-it-py in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (3.0.0)
Requirement already satisfied: mdit-py-plugins in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (0.4.2)
Requirement already satisfied: ptyprocess>=0.5 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pexpect>4.3->ipython->aeiou==0.0.20->stable-audio-tools) (0.7.0)
Requirement already satisfied: platformdirs>=2.5.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pooch>=1.0->librosa>=0.8.1->aeiou==0.0.20->stable-audio-tools) (4.3.2)
Requirement already satisfied: propcache>=0.2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from yarl<2.0,>=1.12.0->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->stable-audio-tools) (0.2.0)
Requirement already satisfied: future>=0.16.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from pyloudnorm->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (1.0.0)
Requirement already satisfied: fire in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from randomname->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.7.0)
Requirement already satisfied: executing>=1.2.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stack-data->ipython->aeiou==0.0.20->stable-audio-tools) (2.1.0)
Requirement already satisfied: asttokens>=2.1.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stack-data->ipython->aeiou==0.0.20->stable-audio-tools) (2.4.1)
Requirement already satisfied: pure-eval in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from stack-data->ipython->aeiou==0.0.20->stable-audio-tools) (0.2.3)
Requirement already satisfied: absl-py>=0.4 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from tensorboard->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (2.1.0)
Requirement already satisfied: grpcio>=1.48.2 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from tensorboard->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (1.67.1)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from tensorboard->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from tensorboard->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (3.1.3)
Requirement already satisfied: mdurl~=0.1 in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from markdown-it-py->panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (0.1.2)
Requirement already satisfied: webencodings in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from bleach->panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (0.5.1)
Requirement already satisfied: termcolor in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from fire->randomname->descript-audiotools>=0.7.2->descript-audio-codec==1.0.0->stable-audio-tools) (2.5.0)
Requirement already satisfied: uc-micro-py in /Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages (from linkify-it-py->panel>=1.0->holoviews->aeiou==0.0.20->stable-audio-tools) (1.0.3)
Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: argparse
Successfully installed argparse-1.4.0
If running this locally, you can simply set the HF_TOKEN in your local environment (as done below). If you’re using a collab notebook, you first need to upload your HF_TOKEN as a “secret key” to your collab, and the below command won’t have any affect in that case.
import os
import warnings
os.environ['HF_TOKEN'] = 'Your API key'
warnings.filterwarnings('ignore', category=FutureWarning)
Next, we can load the model from huggingface. Note that there are some known dependency issues with stable-audio-tools on M1 Macs, so we recommend running this as a collab notebook (or on some linux system)
import torch
import torchaudio
# import librosa
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond
import IPython.display as ipd
from functools import partial
device = "cuda" if torch.cuda.is_available() else "cpu"
# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]
model = model.to(device)
No module named 'flash_attn'
flash_attn not installed, disabling Flash Attention
First we’ll wrap the sampling code in a simpler wrapper, as there’s a few parameters that need to be provided but are not strictly useful to play around with.
# this just cleans things up a bit so the code below highlights the important knobs
easy_generate = partial(generate_diffusion_cond, sample_size=sample_size, sigma_min=0.3, sigma_max=500, device=device)
Next we can define our conditioning, which for the default Stable Audio Open involves text, timing, and overall length.
# Set up text and timing conditioning
conditioning = [{
"prompt": "clean guitar, sweep picking, 140 bpm, G minor",
"seconds_start": 0, # this says "where" in time the sample is in the song,
"seconds_total": 30 # total sample length in seconds, rest gets padded with silency
}]
seed = 1000
n_steps = 50
cfg = 7.5
sampler = "dpmpp-3m-sde"
output = easy_generate(
model,
conditioning=conditioning,
steps=n_steps, # number of diffusion steps to run
cfg_scale=cfg, # classifier free guidance guidance scale
sampler_type=sampler, # sampling "algorithm", check out https://github.com/Stability-AI/stable-audio-tools/blob/main/stable_audio_tools/inference/sampling.py#L177 for more options
seed=seed,
)
# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")
# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()[:, :round(conditioning[0]['seconds_total']*sample_rate)]
1000
/Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/Users/seungheond/anaconda3/envs/p310/lib/python3.10/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.000061.
warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
Now we can listen to the output! Note: if running on a collab notebook, rendering audio will stop the autosave feature, so be sure to delete the block outputs if you want to turn this back on!
ipd.display(ipd.Audio(output, rate=sample_rate))